Asymptotic cost of cutting down random free 

trees 

Elahe Zohoorian Azad 
Damghan university of Iran 

January 31, 2012 



Abstract 

In this work, we calculate the limit distribution of the total cost in- 
curred by splitting a tree uniformly distributed on the set of all finite 
free trees, appears as an additive functional induced by a toll equal to 
the square of the size of tree. The main tools used are the recent results 
connecting the asymptotics of generating functions with the asymptotics 
of their Hadamard product, and the method of moments. 
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1 Introduction 

Trees are structures suitable for data storage and supporting computer algo- 
rithms, two fundamental aspects of data processing, with applications in many 
fields. The cost of "divide-and-conquer" algorithms can be represented as an 
additive functional of trees. While there are many studies on additive functional 
(see, for example, [ITJ [7J [H]), not enough attention has been given to the dis- 
tributions of functional defined on trees under the uniform model. However, a 
main motivation for undertaking this investigation is that it is key to analyz- 
ing a special type of a Drop-Push model of percolation and coagulation(see [15]). 

In this paper, we consider the additive functional defined on the trees uni- 
formly selected from the set of all the free trees of size n, for n given (called 
Cayley trees in [5]), induced by the toll sequence (n 2 ) n >o (see definition of the 
Section [2]). Our main result, Theorem [1] provides the limit distribution for a 
suitably normalized version of this functional. 

Theorem 1. Let X n be the additive functional defined on the uniform free trees 
of size n, induced by the toll (n 2 ) n >o- Then, 



X n A V2 £ 



1 
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where £ is a random variable whose distribution is characterized by its moments. : 

l? > ~ 2(7fc-2)/2p(5^1^ Ufe ' 

where 

fc-i 

5fe = 2(5fc — 6)(5fc — 4)<ifc_i + ajCLk^j k > 2; ai = \/2. 

i=i 

Curiously, the moments of our limit distribution are proportional to the 
moments of the distribution of the average of the minimum of a normalized 
Brownian Excursion, obtained by j8j Theorem 3.3]. 

In what follows, e = (e(t)) 0<(<1 indicates a normalized Brownian Excursion. 

Theorem 2. The moments of the random variable n, defined by 



rj = 4 / / min e(u)dsdt, 



are given by the formula 



0<s<t<l <* 



E w > = 2 (7fc-4W2 r (5fc-ry ^ 

where 

fe-i 

CJfc = 2(5fe — 6)(5fc — 4)o;fc_i + / ^^j^fc-j k>2; u>i=l. 

3 = 1 

It is not unusual, in this kind of problem to have more than one characteri- 
zation of a limit distribution. For instance, the Wiener index of certain trees is 
given by its moments involving Airy functions, and is alternatively character- 
ized in terms of a Brownian. 



For the demonstration of Theorem [2 we apply the strategy used in [3] to 
obtain the limiting distributions of the additive functionals defined on Catalan 
trees, in particular the singularity analysis of the generating series [6]. Indeed, 
the Hadamard products appear naturally when one analyzes the moments of 
additive functionals of trees. Theorem [1] extends to the moments of all order, 
although the analysis of asymptotic behavior of the first moment, was made 
already in 5 . The steps taken here allow a rather mechanical calculation 
of asymptotic moments of each order, thus facilitating the application of the 
method of moments. 
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2 Generating functions 

We first establish here some notation. Let T be a binary tree and let |T| denote 
the number of nodes in T. Suppose moreover that L{T) and R(T) indicate, 
respectively, the left and right subtrees rooted at the 2 children of the root of 
T. When the tree is not binary, one can still have two subtrees L(T) and R(T), 
by cutting an edge which can be considered as root. 

Definition 1. A functional f defined on a binary tree is called an additive 
functional if it satisfies the recurrence 

f(T)=f(L(T))+f(R(T)) + b lT{ , 

for any tree T with \T\> 1. Here (b n ) n >\ is a given sequence, henceforth called 
the toll function. 

We analyze here a special additive functional on the trees, uniformly dis- 
tributed on {T : \T\ — n}, for n given . By a result attributed to Cayley [2], 
there are U n = n n ~ 2 free trees (U n connected acyclic labelled graphs) on n 
nodes and accordingly, there are T n = rt™ -1 rooted trees (in which a labelled 
node, is called root of tree). Consider the model in which initially each free tree 
of size n is taken uniformly at random. Choose an edge at random among the 
n—l edges of the tree, orient it in a random way, then cut it. This separates the 
tree into an ordered pair of smaller trees, that are now rooted; we call them the 
left and right subtrees. Continue the process with each of the resulting subtree, 
discarding the root. Assum^ that the cost incurred by selecting the edge and 
splitting the tree in a tree of size n is n 2 . Then X n , the total cost incurred for 
splitting a random tree of size n, satisfies, for n > 1, the recurrence 

X n =X Ln +X Rn +n 2 , (1) 

where the indexes L n and R n are, respectively, the sizes of left and right sub- 
trees, obtained by division of the initial tree of size n. So X n appears as the 
additive functional induced by the toll sequence (n 2 ) n >i- 

A motivation, coming from the analysis of algorithms, is as follows. If time 
is reversed, this model described the evolution of a random graph, from a graph 
completely disconnected to a tree and which was used to analyze of the union- 
find algorithms [3J [T31 [H] . Knuth and Schonhage provided a first analysis of it 

1 One can see 1151 Proposition 1] for the main motivation of giving this assumption. Briefly, 
1151 analyzes a Drop-Push model of coagulation in which particles are dropped onto a one 
dimensional lattice and carry out a random walk until they encounter an empty site where 
they become stuck. In such a model, the movements of the particles, on the lattice, form an 
additive coalescence processes which gives the good algorithmic reasons for considering the 
recurrence JTJ- In fact, in the Drop-Push model, the cost of coalescence of two clusters of 
particles, at the dropping moment of a particle, is given as the number of steps of the particle 
until it sticks in an empty site and it is proven, 1 151 relation (8)], that the expected cost of 
coalescence of two clusters is proportional to the square of the length of the cluster on which 
a particle drops. 
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in 1978 ([TO]), f° r different tolls however. 

Let p n ^k be the probability for a tree of size n to have the left and right 
subtrees respectively of sizes k and n — k. Then 

fn\ k k - 1 (n-k) n - k - 1 
Pn > k ~ \k) 2(n-l)n»-2 ' (2) 

The binomial coefficient (^) takes into account the labelling of the left and 
right subtrees, and the quantity k k ~ 1 (n — k) n ~ k ~ 1 is the number of rooted trees 
of sizes k and n — k. In the denominator, n n ~ 2 is the number of free trees, 
n — 1 is the number of the edges of the initial tree, and finally the coefficient 2 
corresponds to the random orientation of the selected edge. It is convenient to 
write this probability in the form: 

^ ^k^n—k 
Pn.k — 7T7 7T , 

2(ti -1J c„ 

where, Vfc > 1, 

k k - x 
Ck = "TP 



Let us start with the average of the cost function, a n := E(X„), n > 1, 
which is obtained recursively by conditioning on the size of L n : 

a n = E[E L {X L + X n _ L +n 2 )] 
= E L (a L + a n _ L ) + n 2 



22Pn,j{aj + a n -j) + 



n 2 



3 = 1 
n-1 

j=l 



This recurrence can be rewritten as 



n— 1 1 
-C n Cl n — y ^ CjCljC n —j -T C n n: yo) 



n 

3 = 1 



Remark. We replaced n 2 by b n , distinguishing the general form of the gen- 
erating function, so that one can always consider any toll function in the place 
of n 2 . 
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Definition 2. The Hadamard product of two entire series F(z) = ^ n f n z n and 
G(z) = ^2 n g n z n , denoted F(z) G{z), is the entire series defined by 

(F G)(z) = F{z) G(z) := £ f n g n z n . 

n 

Multiplying the equality © by z n /e n and summing over n > 1, we get 

A(z) C(z/e)- f YVc^ — (4) 
Jo „ e w 

= C(z/e))C(z/e) + ]T ^c„6„^, (5) 

where A(z) and C(z) denote the ordinary generating function of {a n ) n >i and 
(c„)n>i, respectively. 

In view of a result of Knuth and Pittel, [5] , we know the singular expansion 
at the dominant singularity z = e _1 of C(z): 

C{z) = 1- V2(l-ez) 1 / 2 + 0(|l-ez|). (6) 

Moreover C satisfies the functional relation C(z) = ze c (*\ 

By differentiation, the relation ([5]) transforms into a linear differential equa- 
tion of the first order, which can be readily solved by the variation-of-constants 
method. Briefly, putting f(z) := A(z)®C(z/e) and t(z) := J2 n 2 ^ i c„6„e~"z", 
the relation ([5]) takes the form 

f{Lo)— = f{z){l-C{z/e))-t(z). (7) 

By taking derivatives, we obtain 

M ; l l-C(z/e) y \l-C(z/e)J dz ' 

On the other hand, the equality C(z/e) — S.e c ^ z ' e ^ implies 

rfg(Ve) = ^ n + dC(z/e) 
dz \z dz 

Assuming now (without loss of generality) the initial condition a\C\ = b\ = 0, 
the solution found will be in the form 



A(z) C(*/e) 



/" Z „ 71—1 , ui n dio 

1 9 «<L—^)cm- (8) 



l-C(z/e) 

And finally as ^^-Cn = Y^j=i \ c o c n~ji we have 

where B(w) denote the ordinary generating function of (i> n ) ra >i- 
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3 Moments by singularity analysis 

Thanks to the singularity analysis technique, we can derive the asymptotics of 
moments of each order. The singularity analysis is a systematic complex- analytic 
technique that relates the asymptotic behavior of sequences to the behavior of 
their generating functions in the proximity of their singularities. The applica- 
bility of singular analysis rests on a technical condition: the Is.- regularity. See 
[51 |B] for more details. 



Definition 3. A junction defined by a Taylor series about the origin with radius 
of convergence equal to 1 is A-regular if it can be analytically continued in a 
domain of form 

A(<M) := {z ■ M < l + ??,|arg(2-l)| > 0}, 

for some r\ > and < < 7r/2. A function f is said to admit a singular 
expansion at z = 1, if it is A-regular and if one can find a sequence of com- 
plex numbers (cj)o<j<j, and an increasing sequence of real numbers (ctj)o<j<j , 
satisfying ctj < A, where A is a real number, such that the relation 

/(z) = 5>(l-*r+0(|l-z| A ) 

3=0 

holds uniformly in z e A(0, rj). It is said to satisfy a singular expansion with 
logarithmic terms if, 

3 1 

f(z) = Cj(L{z))(1 - zr + 0(\1 - z\ A ), L(z) := log , 

where each Cj(.) is a polynomial. 

Recall the definition of the generalized polylogarithm: 

Definition 4. For a an arbitrary complex number and r a nonnegative integer, 
the generalized polylogarithm function Li a r is defined for \z\ < 1, by 

n>l 

In particular, Li±fi(z) = L(z). Moreover, a useful property of generalized 
polylogarithm functions is 

Li a ,r Lip^ s — Li a -^f)^ r + s . 

The singular expansion of the polylogarithm involves the Riemann zeta func- 
tion (see for example [5J Theorem 4]). 
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Lemma 1. The function Li a ^ r (z) is A-regular, and for a ^ {1, 2, . . . } it satisfies 
the singular expansion 

Li afi (z) ~ r(l - a)^- 1 + J2 ^-pC(« " j)t j , (10) 

j>0 J ' 

where 

(1-z) 1 



t = — log z = 



I 

1>1 

For r > 0, the singular expansion of Li a r is obtained using formal derivations: 

d r 

Li ar (z) = (-l) r 7r-r Li afi {z). 

A natural consequence of this lemma (which is a particular case of [H Lemme 
2.6]), is that 

Li a , (z) = r(l-a)(l-z) Q - 1 + 0(|l-z| a ) + C(a)l ct >o; a<\. (11) 



Another result, which is very useful in what follows, is the decomposition of 
the Hadamard product of (1 — z) a (1 — z) b (cf. [5, Proposition 8]). 

Lemma 2. For the real numbers a and b, 

(i - zf © (i - z?~ Yl ^ a ' b){± kr- + E ^ (a ' h)11 ^ — . 

fc>0 ' fc>0 

where the coefficients X and fi are given by 

(o>6) Tjl + a + b) {-afi-bf 

Ah — 



(Ik 



r(i + o)r(i + 6) {-a-bf 

(a.b) _ ' 



r(-l-o-6) (l + a) k {l + b) k 



T(-a)T(-b) (2 + a + b) k ' 

where x k is defined as x(x + 1) . . . (x + k — 1), /or k nonnegative entire. 

Now, equipped with the singularity analysis toolkit, we are in a position to 
find the asymptotic average from the relation (|9|). 

Lemma 3. The expected value of the total cost, induced by the toll n 2 in the 
model of random free trees defined in Section is 

o„ = x/^N n 5/2 + 0(n 3 / 2 ). (12) 
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Proof. Since b n — n 2 , we have B(z) = Li-2.o(z) and the equality (jlip implies 



B(z) =2(l-z)- J + 0(|l-z|- 2 ) 



(13) 



Considering the singular expansion (|6]) of the generating function of the tree, 
Lemma [5] gives 

B(z) C{z/ef = 2- 1 / 2 (l - z)- 3 / 2 + 0(|1 - zp 1 ). 

Consequently, 

a u [5H0C( W /e) 2 



C(u/e) 



-duj — 



3(1 -u)~ 5 / 2 
271 



+odi- w r) 



-L(i- 2 )-3/2 + (|i- z |-i). 



Finally by the relation ([§]) we have 

A(z) C(z/e) = 1(1 - z)- 2 + 0(|1 - z|- 3 / 2 ). 

Moreover, for a positive, we have (see [5], for example) 

/ rt + a — 1\ r(n + a) 



(14) 



[z"](l-z)- Q = 



r(a) 



n /r(a)r(n + l) 
(l + 0(l/n)), 



(15) 



where [z"](l — z)~" denotes the n-th coefhcient of z n in the expansion of (1— z)~ a 
in entire series. The last equality is obtained applying the Stirling formula. 
Then, by the expansion of (|14[) and singularity analysis, we obtain 



4T(2) 



(l + 0(l/n)) + 0(n 1 / 2 ). 



Finally with c„ 



/2tt 



-(1 + Q(l/n)), we obtain (fl2"]l. 



□ 



Now estimating the moments of higher order, we return to the recurrence 
U). For k > 0, n > 1, put 

/i„(fc) :=E(X„ fe ), 

and 

/i„(fc) := c„ e~"/i„(fc). 

Let Mk{z) denote the ordinary generating function of fi n (k), with z marking n. 
For fc = 1, 

/i„(l) := c„e~"a„ and Mi(z) = A(z) C(z/e). 
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For k > 2, we have 



ki + k2+k3=k 

or again 



Conditioning on the size of L„, we obtain 



fc l+ fc 2 + fc 3 = fc 
fc i , &2 < fc 



Multiplying the latter by ^fc n , we obtain 

1 n— 1 

= E ^JWW + r„(fe), (16) 

where 

f k \ nl 1 

ki+k 2 + k 3 = k \ 1 Z 5 a ' j — 1 

Let Rk(z) denote the ordinary generating function of r ra (fc), with z marking 
n. Therefore 

Rk(z) = E (kJ 27 k,) {Biz)Gk3) ° [VaMfef^Mfa^)], (17) 



kl 7^2 < k 



where 

B(z) 0fe3 :=B(z)0-0B(z). 

^3 time 

Multiplying f|16p by z n and summing over n > 1, we obtain 

M fc (z)= / MfcH— M fc (z)C(z/e) + i? fc (z), 

which is identified in the equality ([7|) if there we choose f(z) = Mfc(z) and 
t(z) = Rk{z). Finally, the solution of this equation is 

MM = fdMu)-^L- y (is) 

l-C(z/e)J C{u)/e) 
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Proposition 1. For k > 1, the generating function M k (z) of fi n (k) satisfies 

M k (z) = ^A k (1 - z y^h +OQ1- zr 5 ^ 1 ), (19) 
where the coefficients A k are defined by the recurrence 



A k = 



Proof. The proof is carried out by induction. For k = 1, the proposition has 
been established in view of (fl4l) . For k > 2, we demonstrate that R k (z) has a 
singular expansion in the form 

i? fc (z) = Afc(l - 2 )-efc/2+i + 0(]1 - z|- 5fc / 2 +i). (21) 

Analyzing the various terms on the right hand side of (|17[) . we observe that A k 
are defined by the recurrence (|2"U1) : 



(I) By induction hypothesis, when k\ and fc 2 are both nonzero, and k% = 0, 
the contribution to R k (z) is 

1 M kl (z)M k2 (z) = l -\ Akl {l-z)^+^ +0{\l-z\^ +1 ) 



2 »n/ / 2 

x 



A k2 (l - z)—+i + 0(\1 - z\~^ +1 ) 



l -A kl A k2 (l-z)^ + 0{\l-z\^ 



(II) When fci, k2 and &3 are all nonzero, by relation (jTTJ) and the relation 
below 

\M kl {z)M k2 (z) = 2r(5 ^t) 2 ^ ^W^) + 0(11 " -l""^ +3/2 ), 
and since B(z) &k3 = Li-2k 3 .o(z), the contribution to R k (z) is 
£i-2t 3 ,oW0[2%(# t! W] = 2r 5(fcl 1 +fca ' ) a - ^^ + M +2 ,q(^) 

+ Lz_ 2fe3 , o (z)0O(|l-z|^ ± ^+ 3 / 2 ) 
= 0(|l~z|^+ 3 / 2 ). 
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(III) Consider now the case where k\ is nonzero and where fc 2 — 0. We have 
Mq(z) = C(z/e). The contribution to Rk(z) is the ) times 



l r 



M kl (z)M k2 (z) = - A kl (l - z)^+* + Q(\l - z\-r ±+1 ) 



1-V2(1-js)* +0(\l-z\) 



— Li -5. 

2 r(Mi-i/2) — 



i + 3, W+O(|l-^|^ +1 ) 



Since 



Li- 2k3 A z ) © [2 M fei( z ) M fc2^)] 



.4 



fei 



2r(^-l/2) "-^+^ 



+ ^_ 2fe3 ^o(z)0O(|l-z|^ i + 1 ), 
the contribution to R k (z), for £3 > 2, is 

0(\1 - z \=P+**/W/*) = _ z |^+3/2). 



(IV) In the case where fci is nonzero, fc 2 = and &3 = 1, the contribution to 
Rk(z) is ( fe _ 1 ) = k times 

Ak - lT ^~ 1] = (1 - z)^ + 0{\l z|^+ 3 / 2 ). 

2r(f-3) v ; 



(V) The case where fc 2 is nonzero and k\ = is identical to two preceeding 
cases. 



(VI) The last contribution comes from the single term when both k\ and fc 2 
are zero. In this case, the contribution to R k {z) is 



B(z) &k Q[^C(^) 2 } = Li_ 2kt0 (z)& (l/2 -V2(l-*)*+0(|l-*|)) 
= Li_ 2kfi (z) ^- r( ^f /2) Li 3/2,o(^) + 0(1) J 

= 0(|1 - z |- 2fc +3/2-l) = (\1 - z|- 5fc /2+3/2). 



Adding all these six contributions yields the expansion (|2"Tj) , as well as the 
recurrence formula (|2T)]) . Utilizing (|2ip in (fTE)) , we finally obtain the expansion 
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4 Proof of Theorem [T] 

According to Proposition [TJ the generating function Mk{z) of (c n e~ n u. n (k)) k>1 
has the singular expansion 

M k (z) = ^A k (l - z)-^+h +OQ1- z\- 5k / 2+1 ), 
where satisfy the recurrence (I2U1) . Thus, having 



(l + 0(l/n)), 



in view of (I15p and the techniques of singularity analysis, we obtain 



(22) 



We will utilize this estimate of the A:-th moment to derive from it the limit 
distribution of our additive functional. From (|2"2"j) we obtain, for k > 1, 



E 



-5/2 X 



'- + 0{n- 1 ' 2 ). 



(23) 



Once we prove the following lemma, the hypothesis of [TJ Theorem 30.1], is 
verified and we can be sure that the suite of =^J9=Ts characterizes a unique 
probability law. 

Lemma 4. There exist a constant C < oo such that 

A k 



k\ 



< C k k 5k/2 , 



for all k > 1. 



Proof. The demonstration is by induction. For k £ {1,2}, the inequality is 
satisfied, if we choose the constant C sufficiently large. For k > 2, putting 
Sk '■= 4f an d dividing the recurrence (|2"0)) by fc!, we obtain 



fc-i 



Sfc 



- J] SjSk-j + sk-! (5fc/2 - 2)(5fc/2 - 3) 
^ fc-i 

3=1 

for 7 = 25/4. By the induction hypothesis, 

F|^ +7 C*- 1 (fc-l) : ^ /fc 2 
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Since, for < j < fc/2, the term j 3 (k — j) k J decrease when j grows, we can 
limit the sum, considering the sum for j = 1, j = k — 1 and fc — 2 times j = 2. 
Then, for fc > 3, 

\a k \ < ^[(fc-l) fc - 1 +2(fc-2) fe - 1 ] 5 / 2 + 7 C' £ - 1 fc^ 

< ^(g^^ + C^fc^ 

2 c7 

< C fc fc^, 

where the last inequality justified when we choose C > 27 3~ 5 / 2 . □ 
It follows from Lemma 2] that, for B sufficiently large, 



fcir(^i) 



< # , (24) 



and by [TJ Theorem 30.1], there exists a unique probability distribution 
having the moments . Let Y be a random variable having such a 

probability distribution. We deduce that 



Putting f = and = fc , Afc, we obtain 



^ J ~ o f7fc - 2W2 r( 5fc ~ 1 ) 



and 

fe-i 



afc = 2(5fc — 6)(5fc — 4)a^_i + cijak-j k > 2; ai = \/2, 
what is the statement of Theorem [T] 
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